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HOW DIFFERENT MENTAL TESTS AGREE IN 
RATING CHILDREN 



WALTER S. GUILER 
Miami University 



During the second half of the past school year an extensive 
program of mental and educational measurements was undertaken 
under the writer's supervision in the William McGuffey Training 
School, Teachers College, Miami University. This paper will 
deal with only one section of this testing program, namely, how 
different tests agree in rating children. 

The study is based on four mental ratings of each of sixty-three 
children in the sixth, seventh, and eighth grades of the training 
school. The mental rating employed in this study is the intelligence 
quotient (I.Q.). The limited number of cases is due to the fact 
that complete and extensive data were available for upper-grade 
children only. The mental measures of these children were deter- 
mined by the use of the four mental tests and scales which are 
described briefly in the following paragraphs. 

1. The Stanford Revision of the Binet Scale. — This is an indi- 
vidual scale. It is so constructed that each successfully completed 
test signifies a specified number of months of mental age. The 
pupil's mental-age score is the total number of months of mental 
age which accrue from the tests which he is able to complete. The 
pupil's intelligence quotient (I.Q.) is determined by dividing his 
mental age in months by his chronological age in months. In 
order to conserve space we shall hereafter speak of this scale as 
the Terman Scale. 

2. The National Intelligence Tests. — These are group tests, 
appearing in different forms. Scale A, Form I, and Scale B, 
Form I, were employed in this investigation. The scores resulting 
from the use of the two forms are considered more reliable than 
the scores resulting from the use of only one form. These tests 
make no provision for converting the absolute scores into mental 
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ages and intelligence quotients. However, there is statistical evi- 
dence' to support the claim that for all practical purposes the 
coefficient of brightness (C.B.) is comparable to the intelligence 
quotient (I.Q.)- Hence, for the purpose of this study the absolute 
scores of the National Intelligence Tests were converted into 
coefficients of brightness by means of the scheme which Otis 
provides. The coefficient of brightness was then considered the 
equivalent of the intelligence quotient. 

3. The Illinois Examination. — This is a group test. It attempts 
to measure both mentality and the achievement in school work 
which may be expected to accompany a given mentahty. This 
test provides its own scheme for converting the absolute score into 
mental age. The I.Q. is determined by dividing the mental age 
by the chronological age. 

4. The Pintner Non-Language Tests. — These also are group 
tests. Like the Illinois Examination, they attempt to measure 
both mentality and the achievement which may be expected to 
accompany a given mentahty and provide a scheme for convert- 
ing the absolute scores into intelligence quotients. 

This investigation was carried out for the most part as an 
integral part of a course in mental and educational measurements. 
In the main, the tests were administered by the writer. Some 
assistance was given by individuals with special training for the 
work. Practically all of the scoring of the tests was done by the 
class in measurements under the direction of the writer. 

Table I presents the intelligence quotients and the intelligence 
rank of sixty-three pupils as determined by the four measuring 
instruments. The pupils' numbers are determined by the descend- 
ing order of the I.Q.'s which resulted from the measurement of the 
pupils by the Terman Scale. Thus the pupil with the highest 
I.Q. (139) on the Terman Scale is Pupil i. Table I is read as 
follows: Pupil I is rated 139 I.Q. by the Terman Scale; 120 I.Q. 
by the National Intelligence Tests; 141 I.Q. by the Illinois Exam- 
ination; and 155 I.Q. by the Pintner Non-Language Tests. The 
same pupil is ranked i by the Terman Scale, 4 by the National 

' Otis Group Intelligence Scale: Manual of Directions, pp. 34-35. Yonkers-on- 
Hudson, New York: World Book Co., 1919. 
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TABLE I 

Intelligence Quotients and Intelligence Rank of Sccty-three Pupils 
Measured by Four Mental Tests 



Pcpn. 



Terhan 
ScAu: 



Nationai, Intelli- 
gence Tests 



I.Q. 



Rank 



I.Q. 



Rank 



Illinois 
Examination 



I.Q. 



Rank 



PiNTNER Non- 
Language Tests 



I.Q. 



Rank 



I 
2 

3 
4 
S 
6 

7 
8 

9 

lO 

II 

12 

13 
14 

IS 
i6 

17 
i8 

19 

20 
21 
22 

23 
24 

25 
26 

27 
28 
29 

3° 
31 
32 
33 
34 
35 
36 
37. 
38. 
39- 
40. 
41. 
42. 
43- 
44- 
45- 
46. 
47- 
48. 



139 
138 
138 
133 
132 

13° 
123 
122 
121 
120 
118 
117 
117 
117 

115 
114 
114 
"3 
"3 
112 
III 
108 
108 
108 
105 
loS 
104 
103 
103 
102 

lOI 

100 

100 

100 

99 

99 

99 



97 
96 
96 
95 
91 
91 
91 
90 
90 



I 

2-5 

2.5 

4 

5 

6 

7 

8 

9 
10 
II 
13 
13 
13 
15 

16.S 
16.S 
18. 5 

18.5 

20 

21 

23 

23 

23 

25-5 

25-5 

27 

28. 5 

28. 5 

30 

31 

33 

33 

33 

36 

36 

36 

38.5 

38.5 

40 

41-5 

41-5 

43 

45 

45 

45 

47-5 

47-5 



120 
122 
112 
112 

114 
122 
114 
108 
104 
99 
97 
140 
111 
110 

85 

III 

98 

110 

104 

106 

109 

106 

118 

107 

97 

95 

115 

94 

91 

96 

103 

103 

102 

105 

lOI 

93 

88 

106 

103 

90 

106 

100 

88 

95 

94 

92 

103 

101 



4 

2 

9 
9 
7 
2 

7 
16 

23-5 
34-5 
38 
I 

11-5 

13-5 

57 

"•5 

36 

13-5 

23-5 

19-5 

15 

19-5 

5 

17 
38 
42 

6 
45 
49 
40 
26.5 
26.5 

29 
22 

31 

47 

54 

19-5 

26.5 

51 

195 

33 

54 

42 

45 

48 

26. s 

31 



141 
151 
126 
124 
154 
133 
III 
121 
114 
120 
"3 
137 
127 
121 
96 
124 
116 

139 
120 
121 
116 
136 
108 
108 
119 
107 
121 

96 
113 

96 
100 
109 
114 

93 
104 
112 

83 

91 
123 

96 
117 
126 
104 
109 
100 
100 
108 
108 



3 

2 

9-5 
11-5 

1 

7 
29 

15-5 
24-5 
18.S 
26. s 

5 



15 
49 
II 
22 

4 
18 

15 

22 

6 



33 
33 

20 

36 
15-5 
49 
26.5 

49 

43-5 

30.5 

24-5 

53 

39-5 



57-5 

54 

13 

49 

21 



155 
166 
119 
108 
156 
123 

125 
100 
122 
141 
107 
122 
125 
137 
117 
120 

125 

131 

127 

156 
133 
139 
126 
121 

134 
129 

136 
132 
112 
102 

134 
118 
109 
112 
123 
117 
114 
138 
128 
112 
130 

lOI 

140 

118 

112 
13s 

137 
137 



4 
I 

35 
48 

2-5 
29-5 
26 
58 
31-5 

5 
49 
31-S 
26 
10 

38.5 
34 
26 
18 
23 

2.5 
16 

7 
24 
33 
U-5 
21 
12 
17 

44-5 
56 
14-5 
36.5 
47 
44-5 
29-5 
38.5 
41 

8 
22 

44-5 
19-5 
57 

6 

36.5 

44-5 

13 

10 

10 
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Pupil 



Terman 
Scale 



NAnoNAt Intelli- 
gence Tests 



I.Q. 



Rank 



I.Q. 



Rank 



Illinois 
Examination 



I.Q. 



Rank 



PiNTNER Non- 
Language Tests 



I.Q. 



Rank 



49 

5° 

SI 

52 

53 

S4 

55 

56 

57 

58 

59 

6o 

6i 

62 

63 

Median .... 

Middle 50 per 
cent 



87 
86 
86 
86 
84 
83 
83 
82 
81 
81 

79 
70 
64 



49-5 

49-5 

51 

53 

53 

53 

55 

56. 5 

56.5 

58 

59-5 

59-5 

61 

62 

63 



95 
90 
94 
99 

97 
90 
82 



83 
86 

lOI 

81 
60 
78 



42 

51 

45 

34- 

38 

51 

59 

54 

61. 

S8 

56 

31 

60 

63 
61. 



96 
83 
88 

105 
104 

96 
80 
96 

83 

76 

104 

100 

83 
66 
66 



49 

57-5 

55 

37 

39-5 

49 

60 

49 

57-5 

61 

39 



43 
57 
62 
62 



loi 



108 



90-114 



92-128 



96-121 



116 
104 

"3 
106 

103 

77 

94 

84 

86 

124 

130 

103 

104 

103 

65 



40 

51-5 
42 

50 
54 
62 

59 

61 

60 

28 

19-5 

54 

51-5 

54 

63 



108-133 



Intelligence Tests, 3 by the Illinois Examination, and 4 by the 
Pintner Non-Language Tests. 

The purpose of Table I is to present the original data of the 
investigation. The table serves also to indicate in a rough way 
how well the different mental tests agree in rating and ranking 
children. Inspection of the table will show that the largest amount 
of agreement in mental ratings is found when only two tests are 
considered. Less agreement is found when three tests are con- 
sidered, and still less agreement is found in the case of all four 
tests. It is interesting to note that in only two instances (Pupils 
30 and 57) do all four tests agree in placing any pupil within a 
range of ten LQ. steps. 

A surprisingly close agreement is noted among the different 
tests in their ranking of certain children. Thus all four tests rank 
seven pupils within a range of iive steps. Three tests (Terman, 
National, and Pintner) rank seventeen pupils within the same 
range, while two tests (Terman and National) rank twenty-eight 
pupils within this range. On the other hand, one is amazed at 
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the extent of disagreement existing among the different tests in 
ranking some of the children. Pupil 8 is ranked eighth from the 
top of the group by the Terman Scale, and he is ranked sixth from 
the bottom by the Pintner test. There is also great diversity in 
the ranking of Pupils 4, 11, 15, 28, 34, 38, 42, 43, 46, 47, 48, 58, and 
59. More detailed statements of agreement in both the rating and 
ranking of children by the different mental tests employed in this 
investigation appear in the presentation and discussion of the tables 
which follow. 

The outstanding fact in Table II is the high correlation existing 
between some of the distributions of measures derived from the 
use of the mental tests. However, these high correlations are not 
as significant as they might at first appear. They are high because 

TABLE II 

Agreement among Mental Measures Expressed in 
Terms of Correlation 

Tests Correlation in Scores 

Terman and Illinois 88 



National and Illinois. . 
Terman and National . 
Illinois and Pintner . . . 
Terman and Pintner. . 
National and Pintner. , 



81 
75 
SI 
40 

39 



the instances where one test measures higher than another in niun- 
ber of cases and in steps of I.Q. are counterbalanced by instances 
where the opposite is true. It is only as we analyze these mental 
measures, making comparisons between ratings of individual pupils, 
that we can really discover agreement and displacement. This is all 
the more important since the problem of school administration is 
the problem of dealing with the individual in the mass. 

Tables III, IV, and V indicate how well the measuring scales 
agree in rating children when agreement is expressed in terms of 
the range of I.Q. steps. 

Table III sets forth the agreement expressed in terms of the 
range of I.Q. steps when all four measuring instruments are con- 
sidered. The table reads as follows: all four tests rate no pupils 
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within a range of o to 4 I.Q. steps; they rate two pupils within a 
range to 5 to 9 I.Q. steps, etc. 

Table IV shows the agreement in mental ratings expressed in 
terms of I.Q. steps when any three of the measuring instruments 
are under consideration. The second column will serve to illustrate 
how this table should be read. The Terman, National, and Illinois 
tests rank four pupils within a range of o to 4 I.Q. steps; thirteen 
pupils within a range of 5 to 9 I.Q. steps, etc. In other words, the 

TABLE m 

Amount of Agreement in Range of I.Q. Steps Shown 

IN Ratings of Sixty-three Children 

BY Four Mental Tests 

Range of I.Q. Steps Frequency 

0-4 O 

5-9 2 

10-14 8 

1S-19 7 

20-24 13 

25-29 9 

30-34 9 

35-39 3 

40-44 S 

45-49 5 

50-54 2 

Total 63 

Median 25 

Middle 50 per cent 19-34 

difference is shown between the lowest I.Q. rating and the highest 
I.Q. rating yielded by the Terman, National, and Illinois tests. 

An analysis of the data on which Table III is based discloses the 
following facts. When the sixty-three pupils are measured by all 
four tests the mental ratings vary from a range of 6 I.Q. steps in 
the case of one pupil to a range of 52 I.Q. steps in the case of another. 
The range of agreement for the median pupil is 25 I.Q. steps. For 
the middle 50 per cent of the group the agreement expressed in 
range of I.Q. steps varies from 19 to 34. The best 25 per cent of 
agreement lies between a range 6 and 19 I.Q. steps; the second 
best, between 19 and 25; the third, between 25 and 34; while the 
poorest quartile of agreement lies between a range of 34 and 52 
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I.Q. steps. In the literature on mental measurements one is at a 
loss to find any standards as to what constitutes satisfactory agree- 
ment among measuring scales. In the absence of any such stand- 
ards it would seem that as far as the four tests here employed are 
concerned the table reveals disagreement rather than agreement in 
mental ratings. 

An analysis of Table IV shows that when three tests are con- 
sidered, the best agreement in mental ratings is found in the case 
of the Terman, National, and Illinois tests. The inclusion of the 



TABLE IV 

Amottnt of Agreement in Range op I.Q. Steps Shown in Ratings of Sixty-three 
Children by Any Three op the Four Mental Tests 



Range or I.Q. Steps 



0-4 

5-9 

10-14 

15-19 

20-24 

25-29 

30-34 

35-39 

40-44 

45-49 

So-54 

Total 

Median 

Middle 50 per cent 



Frequencv 



Terman, 

National, and 

Illinois 



4 
13 
13 
15 
10 

4 
3 
o 
I 



63 

IS 

g-2i 



Terman, 

Dlinois, and 

Pintner 



2 

2 

12 

10 

13 
8 

7 
I 
2 
6 



63 

21 

14-30 



National, 

Illinois, and 

Pintner 



I 

4 

9 

10 

13 
6 
6 

4 
6 
2 
2 



63 

23 

16-34 



Terman, 

National, and 

Pintner 



o 
7 
7 
5 

14 
8 

7 
3 
7 
3 
2 



63 

24 

17-34 



Pintner ratings in any grouping of mental ratings by three of the 
tests invariably means less agreement in the ratings than when the 
Pintner ratings are not involved. The medians computed from 
the distributions in Table IV set forth these observations in a 
striking way. That better agreement is found among the mental 
ratings by the Terman, National, and lUinois tests than is the 
case where the Pintner ratings are involved is indicated further 
by the agreement in ratings based on the middle 50 per cent. 

Table V presents findings similar to those pointed out in con- 
nection with the discussion of Table IV. When two mental tests 
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are considered, the best agreement in ratings, as determined by 
the median and the middle 50 per cent, is invariably found in those 
groupings where the Pintner ratings are not included. 

Briefly, an analysis of Tables III, IV, and V indicates, in the 
first place, that when the range of I.Q. steps is taken as the basis 
of agreement, sufficient agreement is not found among the mental 
ratings yielded by the use of all four, or any three, or even any 
two tests to warrant absolute dependence on the results of mental 
testing with the tests employed in this investigation. In the 

TABLE V 

Amount of Agreement in Range op I.Q. Steps Shown in Ratings of Stxty-three 
Children by Any Two of the Four Mental Tests 



Range of I.Q. Steps 



Feeqdencv 



Term an 

and 
National 



Terman 

and 
Illinois 



National 

and 
Illinois 



Illinois 

and 
Pintner 



Terman 

and 
Pintner 



National 

and 
Pintner 



0-4 

5-9 

10-14 

15-19 

20-24 

25-29 

30-34 

35-39 

40-44 

45-49 

50-54 

Total 

Median 

Middle 50 per cent 



18 
20 
13 
S 
S 
I 
I 
o 
o 
o 
o 



63 

7 
4-13 



16 
20 
II 
8 
4 
3 
I 
o 



63 

9 

4-17 



II 
18 
15 
9 
5 
3 
I 
I 
o 
o 
o 



63 

II 

6-16 



10 

12 

8 

12 

7 
5 
2 

5 
o 



63 
15 

7-21 



6 
10 

10 
8 
6 
o 
4 
4 
o 



63 

20 

10-29 



5 
10 

8 

5 
12 

4 
6 

4 
7 
o 
2 



63 
21 

10-33 



second place, when the same basis for agreement is employed 
among the ratings of three mental tests, the best agreement is 
found among the Terman, National, and Illinois tests. In the 
third place, the best agreement between any two measuring instru- 
ments is found in the case of the Terman Scale and the National 
Tests, the Terman Scale and the Illinois Examination, and the 
National Tests and the Illinois Examination, in the order given. 

We have come to recognize the fact that one of the essential 
factors in effective educational administration today is the homo- 
geneous grouping of children for purposes of instruction. Hence, 
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any means that will enable school people more accurately to select 
children for such grouping is generally looked upon with favor. 
The literature on mental tests stresses this selective function as 
one of the most important services to be rendered by mental 
measurements. In order that we may ascertain the extent to 
which the tests agree in classifying children on the basis of ability 
to learn, Tables VI, VII, and VIII are presented. 

Table VI is the result of an attempt to discover the extent to 
which all four tests agree in the quartile placement of children. 
In other words, the table shows the extent to which all four tests 
agree in grouping sixty-three children in four equal sections. The 
table reads as follows: in selecting pupils for the first quartile all 

TABLE VI 

Amount of Agreement among the Four Mental 

Tests in the Quartile Placement of 

Sixty-three Pupils 



Quartile 


Number of 
Agreements 


Percentage of 
Agreements 


I 


4 in i6 
I in i6 
I in IS 
8ini6 
14 in 63 
12 in 32 


25-0 

6.3 

6.7 

So.o 

22. 2 


2 


2 


4 


All 


I and 4 


37-S 





four tests agree on four pupils in sixteen or 25 per cent. In the 
second, third, and fourth quartiles they agree on one pupil in six- 
teen, one pupil in fifteen, and eight pupils in sixteen, respectively. 
When all of the quartiles are considered, there is an agreement of 
22.2 per cent. When the first and fourth quartiles are considered, 
the agreement is 37.5 per cent. It is evident that the disagreements 
in the quartile placement of children are, very marked when all 
four of the tests are considered. The best agreement is found in 
the first and fourth quartiles. 

Table VII records the percentage of agreement found when any 
three tests are compared as to the quartile placement of children. 
Reference to the column headed "Terman, National, and Illinois" 
will indicate how the table reads. Thus these three tests agree on 
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69 per cent of the pupils for the upper quartile; 25 per cent for 
the second quartile; 13 per cent for the third quartile; and 50 per 
cent for the lowest quartile. This table reveals certain significant 
facts. In the first place, the best agreement in quartile placement 
is found in the first and fourth quartiles. Second, in general, less 
agreement in quartile placement is found where the Pintner scale 
is included, except in the lowest quartile. Third, the agreement 
is small in the second and third quartiles. 

TABLE VII 

Percentage of Agheemenx among Any Three of the Mental Tests in the 

Quartile Placement op Sixty-three Pupils 



Quartile 


Terman, 

National^ and 

Illinois 


Terman, 

National, and 

Pintner 


Tennan, 

Illinois, and 

Pintner 


National, 

Illinois, and 

Pintner 


I 


69 

25 

13 

so 
40 
59 


25 

6 

26 
5° 
27 
38 


25 
19 
13 
50 
27 
38 


3 
25 
13 
44 
30 
41 


2 


■z 


4 


All 


I and 4 





TABLE VIII 

Percentage of Agreement between Any Two of the Mental Tests in the 

Quartile Placement of Skty-ihree Pupils 



Quartile 


Terman and 
National 


Terman and 
Illinois 


Terman and 
Pintner 


National and 
Illinois 


National and 
Pintner 


Illinois and 
Pintner 




75 
37 
27 
63 
51 
69 


69 

44 
40 

75 
57 
72 


31 
37 
47 
69 
46 
50 


81 
50 
47 
75 
64 
78 


38 
31 
33 
50 
38 
44 


38 
38 
20 
56 
38 

47 


2 


t 




All 


I and 4 



Table VIII records the percentage of agreement between any 
two mental tests in the quartile placement of children. The best 
agreement is found in the case of the first and fourth quartiles. 
Again, the best agreement in quartile placement is found in con- 
nection with the National and Illinois, the Terman and Illinois, 
and the Tennan and National tests, in the order given. 

The following facts are indicated by the data included in Tables 
VI, VII, and VIII. First, much better agreement in the quartile 
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placement of pupils is found when the grouping is made on the 
basis of the mental ratings yielded by certain combinations of two 
of the four tests. These combinations in the order of best agree- 
ment are: National and Illinois, Terman and Illinois, and Terman 
and National. Second, the best agreements occur in connection 
with the first and fourth quartiles. Third, the tests are of Uttle 
value in the placement of children in the middle quartiles. 

Analysis of the facts disclosed by this investigation seem to 
justify the following conclusions: 

1. Mental measurements, in their present state of develop- 
ment, must not be accepted as the final gauge of mentality. This 
conclusion is made on the basis of the large amount of disagree- 
ment existing among the different tests employed in this 
investigation. 

2. It seems unwise to attempt to estimate mentality on the 
basis of a single mental examination. 

3. More attention needs to be given to the displacement of 
mental ratings. While the correlation between some of the dis- 
tributions of mental ratings is marked, the lack of agreement among 
specific ratings in the same distribution is equally striking. 

4. The greatest need in mental testing today seems to be the 
perfection of existing tests and scales. 

5. Mental tests render an important service in selecting chil- 
dren of high and low mentahty. Their usefulness is limited in the 
middle quartiles. 

6. The best agreement in the quartile placement of children is 
found in the case of the National and Illinois tests. The Terman 
and Illinois tests and the Terman and National tests rank second 
and third, respectively. 



